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Abstract 


We present a novel approach to sentence 
simplification which departs from previ¬ 
ous work in two main ways. First, it 
requires neither hand written rules nor a 
training corpus of aligned standard and 
simplified sentences. Second, sentence 
splitting operates on deep semantic struc¬ 
ture. We show (i) that the unsupervised 
framework we propose is competitive with 
four state-of-the-art supervised systems 
and (ii) that our semantic based approach 
allows for a principled and effective han¬ 
dling of sentence splitting. 


1 Introduction 


Sentence simplification maps a sentence to a sim¬ 
pler, more readable one approximating its content. 
As has been argued in dShardlow, 2014j ), sentence 
simplification has many potential applications. It 
is useful as a preprocessing step for a variety of 
NLP systems such as parsers and machine trans¬ 
lation systems ( jChandrasekar et al., 1996| , sum¬ 


marisation (Knight and Marcu, 2000), sentence 


fusion (Filippova and Strube, 2008 1 and seman 


tic role labelling (Vickrey and Roller, 2008). It 
also has wide ranging potential societal appli¬ 
cations as a reading aid for people with apha¬ 
sia ( Carroll et al., 1999] ), for low literacy readers 


(Watanabe et al., 2009) and for non native speak¬ 


ers (Siddharthan, 2002). 

In this paper, we present a novel approach to 
sentence simplification which departs from pre¬ 
vious work in two main ways. First, it requires 
neither hand written rules nor a training coipus 
of aligned standard and simplified sentences. In¬ 
stead, we exploit non aligned Simple and English 
Wikipedia to learn the probability of lexical sim¬ 
plifications, of the semantics of simple sentences 
and of optional phrases i.e., phrase which may be 


deleted when simplifying. Second, sentence split¬ 
ting is semantic based. We show (i) that our unsu¬ 
pervised framework is competitive with four state- 
of-the-art systems and (ii) that our semantic based 
approach allows for a principled and effective han¬ 
dling of sentence splitting. 

2 Related Work 


Earlier work on sentence simplification re¬ 
lied on handcrafted rules to capture syn¬ 
tactic simplification e.g., to split coordi¬ 
nated and subordinated sentences into sev¬ 
eral, simpler clauses or to model e.g., ac¬ 


tive/passive transformations (Siddharthan, 2002 


Chandrasekar and Srinivas, 1997 [ Canning, 2002 


Siddharthan, 201 1 j |Siddharthan, 2010) . While 


these hand-crafted approaches can encode pre¬ 
cise and linguistically well-informed syntactic 
transformations, they do not account for lexical 
simplifications and their interaction with the sen¬ 
tential context. Siddharthan and Mandya (2014) 


therefore propose an approach where hand-crafted 
syntactic simplification rules are combined with 
lexical simplification rules extracted from aligned 
English and simple English sentences, and 
revision histories of Simple Wikipedia. 

Using the parallel dataset formed by Simple En¬ 
glish Wikipedia (SWKp£] and traditional English 
Wikipedia (EWKpJI, further work has focused on 
developing machine learning approaches to sen¬ 
tence simplification. 

|Zhu et al. (2010) 1 constructed a parallel 
Wikipedia corpus (PWKP) of 108,016/114,924 
complex/simple sentences by aligning sen¬ 
tences from EWKP and SWKP and used the 
resulting bitext to train a simplification model 
inspired by syntax-based machine translation 
(Yamada and Knight, 2001j). Their simplification 
model encodes the probabilities for four rewriting 


’http://simple.wikipedia.org 
'http://en.wikipedia.org 




































operations on the parse tree of an input sentences 
namely, substitution, reordering, splitting and 
deletion. It is combined with a language model to 
improve grammaticality and the decoder translates 
sentences into simpler ones by greedily selecting 
the output sentence with highest probability. 


Using both the PWKP corpus developed by 
Zhu et al. (2010j) and the edit history of sim¬ 


ple Wikipedia, [Woodsend and Lapata (2011 1 
learn a quasi synchronous grammar 
( {Smith and Eisner, 2006| ) describing a loose 
alignment between parse trees of complex and 
of simple sentences. Following [Dras (1999] ), 
they then generate all possible rewrites for a 
source tree and use integer linear programming to 
select the most appropriate simplification. They 
evaluate their model on the same dataset used by 
Zhu et al. (2010) namely, an aligned corpus of 


100/131 EWKP/SWKP sentences. 


Wubben et al. (2012] ), 


Coster and Kauchak (2011 ) and |Xu et al. (2016j ) 


saw simplification as a monolingual transla¬ 
tion task where the complex sentence is the 
source and the simpler one is the target. To 
account for deletions, reordering and substitution, 
Coster and Kauchak (2011) trained a phrase 


based machine translation system on the PWKP 
coipus while modifying the word alignment 
output by GIZA++ in Moses to allow for null 
phrasal alignments. In this way, they allow 
for phrases to be deleted during translation. 
Similarly, |Wubben et al. (2012 1 used Moses and 
the PWKP data to train a phrase based machine 
translation system augmented with a post-hoc 
reranking procedure designed to rank the output 
based on their dissimilarity from the source 
sentence. Unlinke Wubben et al. (20f2| and 


Coster and Kauchak (2011) who used machine 


translation as a black box, [Xu et al. (2016| pro¬ 
posed to modify the optimization function of 
SMT systems by tuning them for the sentence 
simplification task. However, in then - work they 
primarily focus on lexical simplification. 


Finally, [Narayan and Gardent (2014) present a 
hybrid approach combining a probabilistic model 
for sentence splitting and deletion with a statistical 
machine translation system trained on PWKP for 
substitution and reordering. 


Our proposal differs from all these approaches 
in that it does not use the parallel PWKP cor¬ 
pus for training. Nor do we use hand-written 


rules. Another difference is that we use a deep 
semantic representation as input for simplifica¬ 
tion. While a similar approach was proposed 
in (Narayan and Gardent, 2014), the probabilistic 
models differ in that we determine splitting points 
based on the maximum likelihood of sequences 
of thematic role sets present in SWKP whereas 


Narayan and Gardent (2014) derive the probabil¬ 


ity of a split from the aligned EWKP/SWKP 
coipus using expectation maximisation. As we 
shall see in Section [4j because their data is more 
sparse, Narayan and Gardent (2014) predicts less 
and lower quality simplifications by sentence split¬ 
ting. 


3 Simplification Framework 

Our simplification framework pipelines three ded¬ 
icated modules inspired from previous work on 
lexical simplification, syntactic simplification and 
sentence compression. All three modules are un¬ 
supervised. 

3.1 Example Simplification 

Before describing the three main modules of our 
simplification framework, we illustrate its work¬ 
ing with an example. Figure [Tj shows the input 
semantic representation associated with sentence 
CDF) and illustrates the successive simplification 
steps yielding the intermediate and final simplified 
sentences shown in ©i-S). 

(1) C. In 1964 Peter Higgs published his second paper in 
Physical Review Letters describing Higgs mechanism 
which predicted a new massive spin-zero boson for the 
first time. 

51 (Lex Simp). In 1964 Peter Higgs wrote his sec¬ 
ond paper in Physical Review Letters explaining Higgs 
mechanism which predicted a new massive elementary 
particle for the first time. 

5 2 (Split). In 1964 Peter Higgs wrote his second pa¬ 
per in Physical Review Letters explaining Higgs mech¬ 
anism. Higgs mechanism predicted a new massive ele¬ 
mentary particle for the first time. 

S (Deletion). In 1964 Peter Higgs wrote his paper 
explaining Higgs mechanism. Higgs mechanism pre¬ 
dicted a new elementary particle. 

First, the input dTjT) is rewritten as CQSi) by re¬ 
placing standard words with simpler ones using 
the context aware lexical simplification method 
proposed in dBiran et al., 201 lj ). 

Splitting is then applied to the seman¬ 
tic representation of © 1 ). Following 
Narayan and Gardent (2014J, we use Boxer □ 

; http://svn.ask.it.usyd.edu.au/trac/candc 
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In 1964 Peter Higgs published his second paper in Physical Review Letters describing Higgs mechanism which predicted 

a new massive spin-zero boson for the first time . 

Lex Simpl. 

In 1964 Peter Higgs wrote his second paper in Physical Review Letters explaining Higgs mechanism which predicted 

a new massive elementary particle for the first time . 
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Figure 1: Simplification of “In 1964 Peter Higgs published his second paper in Physical Review Letters 
describing Higgs mechanism which predicted a new massive spin-zero boson for the first time.” 
































































( jCurran et al., 2007] ) to map the output sentence 
from the lexical simplification step (here Si) 
to a Discourse Representation Structure (DRS, 
(Kamp, 19811). The DRS for Si is shown at the 
top of Figure Q] and a graph rcprcscntatioijf] of 
the dependencies between its variables is shown 
immediately below. In this graph, each DRS 
variable labels a node in the graph and each edge 
is labelled with the relation holding between the 
variables labelling its end vertices. The two tables 
to the right of the picture show the predicates 
(top table) associated with each variable and the 
relation label (bottom table) associated with each 
edge. Boxer also outputs the associated positions 
in the complex sentence for each predicate 
(not shown in the DRS but shown in the graph 
tables). Orphan words i.e., words which have no 
corresponding material in the DRS (e.g., which 
at position 16), are added to the graph (node Oi) 
thus ensuring that the position set associated with 
the graph exactly generates the input sentence. 

Using probabilities over sequences of thematic 
role sets acquired from the DRS representations 
of SWKR the split module determines where and 
how to split the input DRS. In this case, one split is 
applied between X\\ ( explain ) and X\q (predict). 
The simpler sentences resulting from the split are 
then derived from the DRS using the word or¬ 
der information associated with the predicates, du¬ 
plicating or pronominalising any shared element 
(e.g., Higgs mechanism in Figure [B and deleting 
any Oiphan words (e.g., which ) which occurs at 
the split boundary. Splitting thus derives S 2 from 
Si. 

Finally, deletion or sentence compression ap¬ 
plies transforming S 2 into S 3 . 


3.2 Context-Aware Lexical Simplification 

We extract context-aware lexical simplification 
rules from EWKP and SWKF0 using the approach 
described by !Biran et al. (201 1] ). The underlying 
intuition behind these rules is that the word C from 
EWKP can be replaced with a word S from SWKP 

4 The DRS to graph conversion goes through several pre¬ 
processing steps: the relation nn is inverted making modi¬ 
fier noun ( higgs ) dependent of modified noun ( mechanism ), 
named and times are converted to unary predicates, e.g., 
named(x, peter) is mapped to peter(x) and timex(x) = 
1964 is mapped to 1964(*); and nodes are introduced for 
orphan words (e.g., which). 

5 We downloaded the snapshots of English 
Wikipedia dated 2013-12-31 and of Simple En¬ 
glish Wikipedia dated 2014-01-01 available at 

http://dumps.wikimedia.org 


if C and S share similar contexts (ten token win¬ 
dow) in EWKP and SWKP respectively. Given an 
input sentence and the set of simplification rules 
extracted from EWKP and SWKP, we then con¬ 
sider all possible (C, S) substitutions licensed by 
the extracted rules and we identify the best com¬ 
bination of lexical simplifications using dynamic 
programming and rule scores which capture the 
adequacy, in context, of each possible substitu- 
tior[j. 


3.3 Sentence Splitting 


A distinguishing feature of our approach 
is that splitting is based on deep se¬ 
mantic representations rather than phrase 
structure trees - as in ( |Zhu et al., 201 Oj ; 


Woodsend and Lapata, 2011) - or dependency 


trees - as in ( |Siddharthan and Mandya, 2014[ ). 
While 


Woodsend and Lapata (2011) report 


learning 438 splitting rules for their simplification 
approach operating on phrase structure trees 


Siddharthan and Mandya (2014 1 defines 26 hand¬ 


crafted rules for simplifying apposition and/or 
relative clauses in dependency structures and 85 
rules to handle subordination and coordination. 

In contrast, we do not need to specify or to learn 
complex rewrite rules for splitting a complex sen¬ 
tence into several simpler sentences. Instead, we 
simply learn the probability of sequences of the¬ 
matic role sets likely to cooccur in a simplified 
sentence. 

The intuition underlying our approach is that: 


Semantic representations give a clear handle 
on events, on their associated roles sets and 
on shared elements thereby facilitating both the 
identification of possible splitting points and the 
reconstruction of shared elements in the sen¬ 
tences resulting from a split. 

For instance, the DRS in Figure [T] makes clear 
that sentence © 1 ) contains 3 main events and that 
Higgs mechanism is shared between two proposi¬ 
tions. 

To determine whether and where to split the in¬ 
put sentence, we use a probabilistic model trained 
on the DRSs of the Simple Wikipedia sentences 
and a language model also trained on Simple 
Wikipedia. Given the event variables contained 
in the DRS of the input sentence, we consider 

6 For more details on the extraction of lexical simplifica¬ 
tion rules, we refer the reader to |Biran et al. (2011) . For more 
details on the application of these rules using dynamic pro¬ 
gramming, we refer the reader to[Narayan (2014)>. 





















all possible splits between subsequences of events 
and choose the split(s) with maximum split score. 
For instance, in the sentence shown in Figure Q] 
there are three event variables X 3 , Xpj and Xu 
in the DRS. So we will consider 5 split possi¬ 
bilities namely, no split ({X 3 , X 10 , Xu}), two 
splits resulting in three sentences describing an 
event each ({X 3 }, {X 10 }. {Xu}) and one split 
resulting in two sentences describing one and 
two events respectively (i.e., ({X 3 }, {X 10 , Xu}), 
({X 3 , X 10 }, {X u }) and {X 10 }, {X 3 ,X n }). The 
split {X 10 }, {X 3 , Xu} gets the maximum split 
score and is chosen to split the sentence © 1 ) pro¬ 
ducing the sentences © 2 ). 


Semantic Pattern 

prob. 

( (agent, patient )) 

0.059 

( (agent, in, in, patient )) 

0.002 

( (agent, patient), (agent, in, in, patient)) 

0.023 


Table 1: Split Feature Table (SFT) showing some of the 
semantic patterns from Figure[l] 


the deletion module determines whether r and its 
associated DRS subgraphs should be deleted by 
maximising the following objective function: 

Xh :W xP(r\h) xP(w) r {agent,patient, theme, eq\ 

X 

where for each relation r £ K, x r hw = 1 if r 
is preserved and x r hw = 0 otherwise; P(r\h) is 
the conditional probability (estimated on the DRS 
coipus derived from SWKP) of r given the head 
label h\ and P(w) is the relative frequency of w in 
SWKf|| 

Intuitively, this objective function will favor 
obligatory dependencies over optional ones and 
simple words (i.e., words that are frequent in 
SWKP). In addition, the objective function is sub¬ 
jected to constraints which ensure (i) that some 
deletion takes place and (ii) that the resulting DRS 
is a well-formed graph. 


Formally, the split score P sp ut associated with 
the splitting of a sentence S into a sequence of 
sentences s\...s n is defined as: 


- 1 

Jp split — / 

n 


Lsplit + | Lsplit 


-Ls 


x lm Si x 


where n is the number of sentences produced af¬ 
ter splitting; L sp u t is the average length of the split 
sentences (L sp n t = ^ where Ls is the length 
of the sentence S'); L Si is the length of the sen¬ 
tence si, lm Si is the probability of s t given by 
the language model and SFT Si is the likelihood of 
the semantic pattern associated with Sj. The Split 
Feature Table (SFT, Table [[]) is derived from the 
corpus of DRSs associated with the SWKP sen¬ 
tences and the counts of sequences of thematic 
role sets licenced by the DRSs of SWKP sen¬ 
tences. Intuitively, P sp ut favors splits involving 
frequent semantic patterns (frequent sequences of 
thematic role sets) and sub-sentences of roughly 
equal length. This way of semantic pattern based 
splitting also avoids over-splitting of a complex 
sentence. 


3.4 Phrasal Deletion 


Following Filippova and Strube (2008), we for¬ 
mulate phrase deletion as an optimization prob¬ 
lem which is solved using integer linear program- 
min°0. Given the DRS K associated with a sen¬ 
tence to be simplified, for each relation r £ K, 

7 In our implementation, we use lp_solve, 

http://sourceforge.net/projects/lpsolve 


4 Evaluation 

We evaluate our approach both globally and by 
module focusing in particular on the splitting com¬ 
ponent of our simplification approach. 


4.1 Global evaluation 


The testset provided by |Zhu et al. (2010| was 
used by four supervised systems for auto¬ 
matic evaluation using metrics such as BLEU, 
sentence length and number of edits. In ad¬ 
dition, most recent simplification approaches 
carry out a human evaluation on a small 
set of randomly selected complex/simple 
sentence pairs. Thus [Wubben et al. (2012 1 , 
Narayan and Gardent (2014) and 

Siddharthan and Mandya (2014) carry out a 


human evaluation on 20, 20 and 25 sentences 
respectively. 

Accordingly, we perform an automatic com¬ 
parative evaluation using ( jZhu et al., 2010| )’s test- 
set namely, an aligned coipus of 100/131 
EWKP/SWKP sentences; and we carry out a 
human-based evaluation. 


8 To account for modifiers which are represented as predi¬ 
cates on nodes rather than relations, we preprocess the DRSs 
and transform each of these predicates into a single node sub¬ 
tree of the node it modifies. For example in FigureQ] the node 
X 2 labeled with the modifier predicate second is updated to a 
new node X 2 dominating a child labeled with that predicate 
and related to X 2 by a modifier relation. 





















System 

Levenshtein Edit distance 

BLEU 
w.r.t simple 

Sentences 
with splits 

Average 

sentence 

length 

Average 

token 

length 

Complex 

System 

to 

System to Sim¬ 
ple 

LD No edit 

LD 

No edit 

GOLD 

12.24 

3 

0 

100 

100 

28 

27.80 

4.40 

Zhu 

7.87 

2 

14.64 

0 

37.4 

80 

24.21 

4.38 

Woodsend 

8.63 

24 

16.03 

2 

42 

63 

28.10 

4.50 

Wubben 

3.33 

6 

13.57 

2 

41.4 

1 

28.25 

4.41 

Narayan 

6.32 

4 

11.53 

3 

53.6 

10 

26.24 

4.36 

UNSUP 

6.75 

3 

14.29 

0 

38.47 

49 

26.22 

4.40 


Table 2: Automatic evaluation results. Zhu, Woodsend, Wubben, Narayan are the best output of the models of Zhu et al. 
(2010), Woodsend and Lapata (2011), Wubben et al. (2012) and Narayan and Gardent (2014) respectively. UNSUP is our 
model. 


System 

Levenshtein Edit distance 

BLEU Scores 
with respect to 

Average 

sentence 

length 

Average 

token 

length 

Complex 

System 

to 

System to Sim¬ 
ple 

LD No edit 

LD 

No edit 

complex 

simple 

complex 

0 

100 

12.24 

3 

100 

49.85 

27.80 

4.62 

LexSimpl 

2.07 

22 

13.00 

1 

82.05 

44.29 

27.80 

4.46 

Split 

2.27 

51 

13.62 

1 

89.70 

46.15 

29.10 

4.63 

Deletion 

2.39 

4 

12.34 

0 

85.15 

47.33 

25.41 

4.54 

LexSimpl-Split 

4.43 

11 

14.39 

0 

73.20 

41.18 

29.15 

4.48 

LexSimpl-Deletion 

4.29 

3 

13.09 

0 

69.84 

41.91 

25.42 

4.38 

Split-Deletion 

4.63 

4 

13.42 

0 

77.82 

43.44 

26.19 

4.55 

LexSimpl-Split-Deletion 

6.75 

3 

14.29 

0 

63.41 

38.47 

26.22 

4.40 

GOLD (simple) 

12.24 

3 

0 

100 

49.85 

100 

23.38 

4.40 


Table 3: Automated Metrics for Simplification: Modular evaluation. LexSimpl-Split-Deletion is our final system UNSUP. 


Automatic Evaluation Following 

Wubben et al. ( 2012| ), |Zhu et al. (2010[ ) and 
Woodsend and Lapata (2 011| ), we use metrics 
that are directly related to the simplification task 
namely, the number of splits in the overall data, 
the number of output sentences with no edits (i.e., 
sentences which have not been simplified) and 
the average Levenshtein distance (LD) between 
the system output and both the complex and the 
simple reference sentences. We use BLElfl as a 
means to evaluate how close the systems output 
are to the reference coipus. 

Table |2] shows the results of the automatic eval¬ 
uation. The most noticeable result is that our un¬ 
supervised system yields results that are similar to 
those of the supervised approaches. 

The results also show that, in contrast to Wood¬ 
send system which often leaves the input unsim¬ 
plified (24% of the input), our system almost al¬ 
ways modifies the input sentence (only 3% of the 
input are not simplified); and that the number of 
simplifications including a split is relatively high 
(49% of the cases) suggesting a good ability to 
split complex sentences into simpler ones. 

Human Evaluation Human judges were asked 
to rate input/output pairs w.r.t. to adequacy (How 
much does the simplified sentence(s) preserve the 


meaning of the input?), to simplification (How 
much does the generated sentence(s) simplify the 
complex input?) and to fluency (how grammatical 
and fluent are the sentences?). 

We randomly selected 18 complex sentences 
from Zhu’s test corpus and included in the eval¬ 
uation corpus: the corresponding simple (Gold) 
sentence from Zhu’s test corpus, the output of our 
system (UNSUP) and the output of the other four 
systems (Zhu, Woodsend, Narayan and Wubben) 
which were provided to us by the system au¬ 
thor^ We collected ratings from 18 partici¬ 
pants. All were either native speakers or proficient 
in English, having taken part in a Master taught 
in English or lived in an English speaking coun¬ 
try for an extended period of time. The evalu¬ 
ation was done online using the LG-Eval toolkit 
dKow and Belz, 2012] jj] and a Latin Square Ex¬ 
perimental Design (LSED) was used to ensure a 
fair distribution of the systems and the data across 
raters. 

Table |4| shows the average ratings of the hu¬ 
man evaluation on a scale from 0 to 5. Pair¬ 
wise comparisons between all models and their 
statistical significance were earned out using a 
one-way ANOVA with post-hoc Tukey HSD tests. 


l, Moses support tools: multi-bleu 

http: //www. statmt. org/moses/ ?n=Moses . Support Tdfchlfcf::/Avww.nltg.brighton.ac.uk/research/lg-eval/ 


lw We upload the outputs from all the systems as supple¬ 
mentary material with this paper. 



































System pairs 

A B 

Average Score (number of split sentences) 

ALL-A 

ALL-B 

ONLY-A 

BOTH-AB 

ONLY-B 

A 

B 

GOLD 


3.85(28) 

2.15(32) 

2.80(17) 

3.70(17) 

4.05(11) 

Zhu 


2.25(80) 

1.53(4) 

2.45(45) 

2.42(45) 

2.02(35) 

UNSUP Woodsend 

2.37(49) 

2.08(63) 

2.42(11) 

2.36(38) 

2.29(38) 

1.78(25) 

Wubben 


2.73(1) 

2.32(48) 

4.75(1) 

2.73(1) 

0 (0) 

Narayan 


2.09(10) 

2.29(41) 

2.78(8) 

1.79(8) 

3.81(2) 


Table 5: Pairwise split evaluation: Each row shows the pairwise comparison of the quality of splits in UNSUP and some other 
system. Last six columns show the average scores and number of associated split sentences. The second column (ALL-A) 
and the third column (ALL-B) present the quality of all splits by systems A and B respectively. The fourth column (ONLY-A) 
represents sentences where A splits but not B. The fifth and sixth columns represents sentences where both A and B split. The 
seventh column (ONLY-B) represents sentences where B splits but not A. 


Systems 

Simplicity 

Fluency 

Adequacy 

GOLD 

3.62 

4.69 

3.80 

Zhu 

2.62 

2.56 

2.47 

Woodsend 

1.69 

3.15 

3.15 

Wubben 

1.52 

3.05 

3.38 

Narayan 

2.30 

3.03 

3.35 

UNSUP 

2.83 

3.56 

2.83 


Table 4: Average Human Ratings for simplicity, fluency 
and adequacy. 

If we group together systems for which there is 
no significant difference (significance level: p < 
0.05), our system is in the first group together 
with Narayan and Zhu for simplicity; in the first 
group for fluency; and in the second group for ade¬ 
quacy (together with Woodsend and Zhu). A man¬ 
ual examination of the results indicates that UN¬ 
SUP achieves good simplicity rates through both 
deletion and sentence splitting. Indeed, the aver¬ 
age word length of simplified sentences is smaller 
for UNSUP (26.22) than for Wubben (28.25) 
and Woodsend (28.10); comparable with Narayan 
(26.19) and higher only than Zhu (24.21). 

4.2 Modular Evaluation 

To assess the relative impact of each module (lexi¬ 
cal simplification, deletion and sentence splitting), 
we also conduct an automated evaluation on each 
module separately. The results are shown in Ta¬ 
ble 0 

One first observation is that each module has an 
impact on simplification. Thus the average Lev- 
enshtein Edit distance (LD) to the source clause 
(complex) is never null for any module while the 
number of “No edit” indicates that lexical simpli¬ 
fication modifies the input sentence in 78%, sen¬ 
tence splitting 49% and deletion 96% of the cases. 

In terms of output quality and in particular, sim¬ 
ilarity with respect to the target clause, deletion is 
the most effective (smallest LD, best BLEU score 
w.r.t. target). Lurther, the results for average token 
length indicate that lexical simplification is effec¬ 


tive in producing shorter words (smaller average 
length for this module compared to the other two 
modules). 

Predictably, combining modules yields systems 
that have stronger impact on the source clause 
(higher LD to complex, lower number of No Ed¬ 
its) with the full system (i.e., the system combin¬ 
ing the 3 modules) showing the largest LD to the 
sources (LD to complex) and the smallest number 
of source sentences without simplification (3 No 
Edits). 

4.3 Sentence Splitting Using Deep Semantics 

To compare our sentence splitting approach with 
existing systems, we collected in a second human 
evaluation, all the outputs for which at least one 
system applied sentence splitting. The raters were 
then asked to compare pairs of split sentences pro¬ 
duced by two distinct systems and to evaluate the 
quality (0:very bad to 5:very good) of these split 
sentences taking into account boundary choice, 
sentence completion and sentence reordering. 

Table [5] shows the results of this second evalua¬ 
tion. For each system pair comparing UNSUP (A) 
with another system (B), the Table gives the scores 
and the number of splits of both systems: for the 
inputs on which both systems split (BOTH-AB), 
on which only UNSUP splits (ONLY-A) and on 
which only the compared system split (ONLY-B). 

UNSUP achieves a better average score (ALL- 
A = 2.37) than all other systems (ALL-B column) 
except Wubben (2.73). However Wubben only 
achieves one split and on that sentence, UNSUP 
score is 4.75 while Wubben has a score of 2.73 
and produces an incorrect split (cf. S 3 in Figure [ 6 ]). 
UNSUP 

In terms of numbers of splits, three systems of¬ 
ten simplify by splitting namely Zhu (80 splits), 
Woodsend (63) and UNSUP (49). Interest¬ 
ingly, Narayan, trained on the parallel coipus of 
Wikipedia and Simplified Wikipedia splits less of- 


















SI 

Complex. This array distributes data across multiple disks, but the array is seen by the computer user and operating 
system as one single disk. 

Zhu. This array sells data across multiple disks but the array is seen. The computer user and operating as one disk. [0] 
Woodsend. This array distributes data across several disks. The array is seen by the computer user and operating 
system as one single disk. [4.85] 

UNSUP. This array moves data across disks. The array is seen by the computer user and operating system as one disk. 
[4.83] 

S2 

Complex. Here, simulations of evolution using evolutionary algorithms and artificial life started with the work of 
Nils Aall Barricelli in the 1960s, and was extended by Alex Fraser, who published a series of papers on simulation of 
artificial selection. 

Zhu. Here, simulations of evolution using evolutionary algorithms and artificial life started with the work of nils aall 
barricelli in the 1960s, and was extended by alex fraser. Alex fraser published a series of papers on feeling of artificial 
selection. [3.6] 

Woodsend. It has Genetic algorithms. Evolution is using artificial life. This is Nils Aall Barricelli in the 1960s. Here, 
simulations started with the work. He was extended by Alex Fraser. He published a group of papers on simulation of 
artificial selection. [0.33] 

UNSUP. Here, models of evolution using evolutionary algorithms and artificial life started with the work in the 1960s, 
and was led by Alex Fraser. Alex Fraser wrote a series of papers on model of selection. [5] 

S3 

Complex. By 1928, the regional government was moved from the old Cossack capital Novocherkassk to Rostov, which 
also engulfed the nearby Armenian town of Nor Nakhijevan. 

Zhu. By 1928, the government was moved from the old cossack capital novocherkassk to rostov. Rostov also of the 
city the nearby armenian town of nor nakhijevan. [2.8] 

Woodsend. By 1928, the regional government was moved from the old Cossack capital Novocherkassk to Rostov. Both 
also engulfed the nearby Armenian town of Nor Nakhijevan. [3] 

Wubben. by 1928 , the regional government was moved from the old cossack capital novocherkassk to rostov. the 
nearby armenian town of nor nakhichevan. [2.7] 

Narayan. by 1928, the regional government was moved from the old cossack capital novocherkassk to rostov. rostov 
that engulfed the nearby armenian town of nor nakhichevan. [2.7] 

UNSUP. The regional government was moved from the old Cossack capital Novocherkassk to Rostov. Rostov also 
absorbed the nearby town of Nor Nakhijevan. [4.75] 


Table 6 : Example Outputs for Sentence splitting with their average human annotation scores. 


ten (10 splits vs 49 for UNSUP) and less well (2.09 
average score versus 2.37 for UNSUP). This is un¬ 
surprising as the proportion of splits in SWKP was 
reported in (Narayan and Gardent, 2014) to be a 
low 6%. In contrast, the set of observations we 
use to learn the splitting probability is the set of 
all sequences of thematic role sets derived from 
the DRSs of the SWKP coipus. 

In sum, the unsupervised, semantic-based split¬ 
ting strategy allows for a high number (49%) of 
good quality (2.37 score) sentence splits . Be¬ 
cause there are less possible patterns of thematic 
role sets in simple sentences than possible con¬ 
figurations of parse/dependency trees for complex 
sentences, it is less prone to data sparsity than the 
syntax based approach. Because the probabilities 
learned are not tied to specific syntactic structures 
but to more abstract semantic patterns, it is also 
perhaps less sensitive to parse errors. 


4.4 Examples from the Test Set 

Table [6] shows some examples from the evaluation 
dataset which were selected to illustrate the work¬ 
ings of our approach and to help interpret the re¬ 
sults in Table [2J S] and [3 

SI and S2 and S3 show examples of context- 
aware unsupervised lexical substitutions which are 


nicely performed by our system. In SI, The array 
distributes data is correctly simplified to The ar¬ 
ray moves data whereas Zhu’s system incorrectly 
simplifies this clause to The array sells data. Simi¬ 
larly, in S2, our system correctly simplifies Papers 
on simulation of artificial selection to Papers on 
models of selection while the other systems either 
do not simplify or simplify to Papers on feeling. 

For splitting, the examples show two types 
of splitting performed by our approach namely, 
splitting of coordinated sentences (SI) and split¬ 
ting between a main and a relative clause 
(S2,S3). S2 illustrates how the Woodsend sys¬ 
tem over-splits, an issue already noticed in 
( jSiddharthan and Mandya, 2014 ); and how Zhu's 
system predicts an incorrect split between a verb 
(seen) and its agent argument (by the user). 
Barring a parse error, such incorrect splits will 
not be predicted by our approach since, in our 
cases, splits only occur between (verbalisations of) 
events. SI, S2 and S3 also illustrates how our se¬ 
mantic based approach allows for an adequate re¬ 
construction of shared elements. 


5 Conclusion 


A major limitation for supervised simplification 
systems is the limited amount of available paral- 
















lei standard/simplified data. In this paper, we have 
shown that it is possible to take an unsupervised 
approach to sentence simplification which requires 
a large corpus of standard and simplified language 
but no alignment between the two. This allowed 
for the implementation of contextually aware sub¬ 
stitution module; and for a simple, linguistically 
principled account of sentence splitting and shared 
element reconstruction. 
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